[Cohere] Add cohere2_moe model support by Terrencezzj · Pull Request #1340 · ml-explore/mlx-lm

Terrencezzj · 2026-06-02T20:44:17Z

[Cohere] Add cohere2_moe model support

Adds cohere2_moe architecture support to mlx-lm.
The PR also adds compressed-tensors W4A16 loading support, so quantized Cohere2 MoE checkpoints can run on mlx
Added test_cohere2_moe model construction/generation-cache coverage in tests/test_models.py

Test plan

python -m pip install -e .
python -m mlx_lm generate \
  --model /path/to/cohere2_moe_nvfp4 \
  --prompt "Solve this coding problem: given strings s and t, return the minimum number of characters to append to s so t becomes a subsequence." \
  --max-tokens 32768 \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 0

Terrencezzj added 3 commits June 2, 2026 16:27

cohere2_moe support

15e84be

fix typo

c8046b2

scale-folding

f43507c

nastya236 added the enhancement New feature or request label Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cohere] Add cohere2_moe model support#1340

[Cohere] Add cohere2_moe model support#1340
Terrencezzj wants to merge 3 commits into
ml-explore:mainfrom
Terrencezzj:cohere2_moe

Terrencezzj commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Terrencezzj commented Jun 2, 2026

[Cohere] Add cohere2_moe model support

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants